Sanity check for trained model¶
- load a saved model
- make a tiny test dataset with 10 examples from each class
- make predictions on these test examples
- display the spectrograms, audio and model prediction for each of the examples.
This report shows the model's prediction vs. ground truth for a few examples. In the examples, the Olive Sided Flycatcher (OSFL) is either present or absent
- Present - The 3s clip of audio contains a complete human labelled tag of the OSFL.
- Absent - The example is audio taken from before the start of a human labelled OSFL tag.
The scores are made on a validation set, which is defined as
- The unseen audio from 20% of the ARU locations in the training set. The model has not been trained on audio from these locations.
There is some further processing which need to be applied to this dataset:¶
- Mix in audio from other times of day, and other habitats. This is to ensure that the training data contains as much variety as possible.
- Replace the human labelled audio samples with high scoring ones picked out by HawkEars model - these should end up all being focal recordings. This is assumed to produce a relationsip between sound power and recognizer score. This will enable density estimation and other downstream statistical applications.
- bandpass the input signal to remove sounds from frequencies outside the OSFL vocalization range
- Go through the audio samples in the validation set and remove any which are obviously labelled incorrectly, and flag those which are borderline.
In [ ]:
# imports
from pathlib import Path
import pandas as pd
import sys
BASE_PATH = Path.cwd().parent.parent
sys.path.append(str(BASE_PATH))
import pandas as pd
import opensoundscape as opso
from opensoundscape import Audio, Spectrogram
/Users/mikeg/miniforge3/envs/osfl2/lib/python3.10/site-packages/opensoundscape/ml/cnn.py:18: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console) from tqdm.autonotebook import tqdm
In [ ]:
# Load the validation set
train_valid_set_path = BASE_PATH / 'data' / 'interim' / 'train_and_valid_set'
valid_set = pd.read_pickle(train_valid_set_path / 'valid_ds_sample_size_0.1.pkl')
valid_set.sample(5)
Out[Â ]:
| target_presence | target_absence | |||
|---|---|---|---|---|
| file | start_time | end_time | ||
| ../../data/raw/recordings/OSFL/recording-816784.flac | 12.0 | 15.0 | 0.0 | 1.0 |
| ../../data/raw/recordings/OSFL/recording-291508.mp3 | 96.0 | 99.0 | 0.0 | 1.0 |
| ../../data/raw/recordings/OSFL/recording-815818.flac | 15.0 | 18.0 | 0.0 | 1.0 |
| ../../data/raw/recordings/OSFL/recording-481585.flac | 169.5 | 172.5 | 0.0 | 1.0 |
| ../../data/raw/recordings/OSFL/recording-291730.mp3 | 220.5 | 223.5 | 0.0 | 1.0 |
In [ ]:
# Take a sample of 10 from each class
present_samples = valid_set.loc[valid_set['target_presence']==1].sample(10)
absent_samples = valid_set.loc[valid_set['target_absence']==1].sample(10)
present_samples.index[0]
Out[Â ]:
(PosixPath('../../data/raw/recordings/OSFL/recording-291576.mp3'), 0.0, 3.0)
In [ ]:
# Load a trained model.
model = opso.cnn.load_model(BASE_PATH / 'models' / 'best.model')
model.valid_metrics
Out[Â ]:
{0: {'confusion_matrix': array([[ 914, 69],
[8978, 1541]]),
'precision': 0.9571428571428572,
'recall': 0.1464968152866242,
'f1': 0.2541017396322862,
'jaccard': 0.11864998939763832,
'hamming_loss': 0.7865588593288124},
1: {'confusion_matrix': array([[ 948, 35],
[9303, 1216]]),
'precision': 0.9720223820943246,
'recall': 0.11560034223785531,
'f1': 0.20662701784197113,
'jaccard': 0.10369054294846008,
'hamming_loss': 0.8118588071639715},
2: {'confusion_matrix': array([[ 914, 69],
[6383, 4136]]),
'precision': 0.9835909631391201,
'recall': 0.3931932693221789,
'f1': 0.5618038576473784,
'jaccard': 0.2573572651932767,
'hamming_loss': 0.5609459224482699},
3: {'confusion_matrix': array([[ 890, 93],
[4288, 6231]]),
'precision': 0.9852941176470589,
'recall': 0.5923566878980892,
'f1': 0.7398919432405153,
'jaccard': 0.37800694445487304,
'hamming_loss': 0.3808902799513128},
4: {'confusion_matrix': array([[ 813, 170],
[1501, 9018]]),
'precision': 0.9814976055724859,
'recall': 0.8573058275501474,
'f1': 0.9152077941848075,
'jaccard': 0.5854828748503473,
'hamming_loss': 0.1452790818988002}}
In [ ]:
model.predict(valid_set, batch_size=32)
0%| | 0/51 [00:00<?, ?it/s]
Out[Â ]:
| target_presence | target_absence | |||
|---|---|---|---|---|
| file | start_time | end_time | ||
| ../../data/raw/recordings/OSFL/recording-4819.mp3 | 0.0 | 3.0 | -1.017535 | 1.184056 |
| 1.5 | 4.5 | -2.782332 | 2.765101 | |
| 3.0 | 6.0 | -1.430539 | 1.502954 | |
| 4.5 | 7.5 | -1.984744 | 1.893654 | |
| 6.0 | 9.0 | -2.831836 | 2.863616 | |
| ... | ... | ... | ... | ... |
| ../../data/raw/recordings/OSFL/recording-826279.flac | 4.5 | 7.5 | -3.197888 | 3.422475 |
| 7.5 | 10.5 | -0.686746 | 1.046495 | |
| ../../data/raw/recordings/OSFL/recording-826374.flac | 0.0 | 3.0 | -3.158400 | 3.226395 |
| 3.0 | 6.0 | -2.232522 | 2.313747 | |
| 15.0 | 18.0 | 6.799380 | -7.086478 |
1617 rows × 2 columns
In [ ]:
present_preds = model.predict(present_samples, activation_layer='sigmoid')
0%| | 0/10 [00:00<?, ?it/s]
In [ ]:
absent_preds = model.predict(absent_samples, activation_layer='sigmoid')
0%| | 0/10 [00:00<?, ?it/s]
In [ ]:
# rename columns for better clarity after they're combined
present_samples.rename(columns = {'target_presence':'present_label', 'target_absence':'absent_label'}, inplace = True)
present_preds.rename(columns = {'target_presence':'present_pred', 'target_absence':'absent_pred'}, inplace = True)
absent_samples.rename(columns = {'target_presence':'present_label', 'target_absence':'absent_label'}, inplace = True)
absent_preds.rename(columns = {'target_presence':'present_pred', 'target_absence':'absent_pred'}, inplace = True)
# combine labels and predictions for samples of present and absent classes
present_labels_and_preds = pd.concat([present_samples, present_preds], axis=1)
absent_labels_and_preds = pd.concat([absent_samples, absent_preds], axis=1)
combined = pd.concat([present_labels_and_preds, absent_labels_and_preds], axis=0)
combined
Out[Â ]:
| present_label | absent_label | present_pred | absent_pred | |||
|---|---|---|---|---|---|---|
| file | start_time | end_time | ||||
| ../../data/raw/recordings/OSFL/recording-291576.mp3 | 0.0 | 3.0 | 1.0 | 0.0 | 0.995771 | 0.002882 |
| ../../data/raw/recordings/OSFL/recording-291508.mp3 | 295.5 | 298.5 | 1.0 | 0.0 | 0.990465 | 0.007913 |
| ../../data/raw/recordings/OSFL/recording-552659.flac | 13.5 | 16.5 | 1.0 | 0.0 | 0.882287 | 0.120558 |
| ../../data/raw/recordings/OSFL/recording-292300.mp3 | 3.0 | 6.0 | 1.0 | 0.0 | 0.741774 | 0.274831 |
| ../../data/raw/recordings/OSFL/recording-553501.flac | 177.0 | 180.0 | 1.0 | 0.0 | 0.526428 | 0.483489 |
| ../../data/raw/recordings/OSFL/recording-292035.mp3 | 28.5 | 31.5 | 1.0 | 0.0 | 0.999022 | 0.000715 |
| ../../data/raw/recordings/OSFL/recording-294423.mp3 | 3.0 | 6.0 | 1.0 | 0.0 | 0.999152 | 0.000562 |
| ../../data/raw/recordings/OSFL/recording-295299.mp3 | 18.0 | 21.0 | 1.0 | 0.0 | 0.997536 | 0.001722 |
| ../../data/raw/recordings/OSFL/recording-292249.mp3 | 6.0 | 9.0 | 1.0 | 0.0 | 0.995068 | 0.002990 |
| ../../data/raw/recordings/OSFL/recording-104311.mp3 | 130.5 | 133.5 | 1.0 | 0.0 | 0.998575 | 0.001363 |
| ../../data/raw/recordings/OSFL/recording-554028.flac | 10.5 | 13.5 | 0.0 | 1.0 | 0.015250 | 0.984797 |
| ../../data/raw/recordings/OSFL/recording-291508.mp3 | 217.5 | 220.5 | 0.0 | 1.0 | 0.169071 | 0.816377 |
| ../../data/raw/recordings/OSFL/recording-481585.flac | 97.5 | 100.5 | 0.0 | 1.0 | 0.562846 | 0.408219 |
| ../../data/raw/recordings/OSFL/recording-296785.mp3 | 22.5 | 25.5 | 0.0 | 1.0 | 0.049864 | 0.950873 |
| ../../data/raw/recordings/OSFL/recording-553491.flac | 99.0 | 102.0 | 0.0 | 1.0 | 0.074578 | 0.927464 |
| ../../data/raw/recordings/OSFL/recording-291730.mp3 | 69.0 | 72.0 | 0.0 | 1.0 | 0.015597 | 0.985294 |
| ../../data/raw/recordings/OSFL/recording-292071.mp3 | 43.5 | 46.5 | 0.0 | 1.0 | 0.057902 | 0.951844 |
| ../../data/raw/recordings/OSFL/recording-554028.flac | 19.5 | 22.5 | 0.0 | 1.0 | 0.667904 | 0.336344 |
| ../../data/raw/recordings/OSFL/recording-553501.flac | 34.5 | 37.5 | 0.0 | 1.0 | 0.074124 | 0.937287 |
| ../../data/raw/recordings/OSFL/recording-815882.flac | 36.0 | 39.0 | 0.0 | 1.0 | 0.047265 | 0.958468 |
In [ ]:
def show_example(counter):
path, offset, end_time = combined.index[counter]
duration = end_time - offset
audio = Audio.from_file(path, offset=offset, duration=duration)
spectrogram = Spectrogram.from_audio(audio)
print(path, offset, end_time)
print(audio.metadata)
print(f"Present Prediction = {combined.iloc[counter].present_pred} \nActual = {combined.iloc[counter].present_label}")
print("Check below")
audio.show_widget()
spectrogram.plot()
counter += 1
return counter
Show Predictions¶
See example labels, predictions, audio and spectrograms below
In [ ]:
next_example_idx = 0
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291576.mp3 0.0 3.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9957714676856995
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291508.mp3 295.5 298.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9904650449752808
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-552659.flac 13.5 16.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 13229824, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.8822866082191467
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292300.mp3 3.0 6.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 7957844, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.7417735457420349
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553501.flac 177.0 180.0
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.5264281034469604
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292035.mp3 28.5 31.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9990221261978149
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-294423.mp3 3.0 6.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9991520643234253
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-295299.mp3 18.0 21.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9975355863571167
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292249.mp3 6.0 9.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 7957844, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.995067834854126
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-104311.mp3 130.5 133.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 7937792, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.9985754489898682
Actual = 1.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-554028.flac 10.5 13.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.015250089578330517
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291508.mp3 217.5 220.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.16907060146331787
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-481585.flac 97.5 100.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 13230000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.5628458857536316
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-296785.mp3 22.5 25.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 2653384, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.049863532185554504
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553491.flac 99.0 102.0
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.07457751035690308
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-291730.mp3 69.0 72.0
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.015596892684698105
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-292071.mp3 43.5 46.5
{'samplerate': 44100, 'format': 'MP3', 'frames': 13259997, 'sections': 1, 'subtype': 'MPEG_LAYER_III', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.05790164694190025
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-554028.flac 19.5 22.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.6679044961929321
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-553501.flac 34.5 37.5
{'comment': 'Processed by SoX', 'samplerate': 44100, 'format': 'FLAC', 'frames': 7938000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.07412396371364594
Actual = 0.0
Check below
In [ ]:
next_example_idx = show_example(next_example_idx)
../../data/raw/recordings/OSFL/recording-815882.flac 36.0 39.0
{'comment': 'Processed by SoX', 'samplerate': 32000, 'format': 'FLAC', 'frames': 19200000, 'sections': 1, 'subtype': 'PCM_16', 'channels': 1, 'duration': 3.0, 'filesize': nan}
Present Prediction = 0.04726511985063553
Actual = 0.0
Check below
Further work for this notebook:
- show confusion matrix
- plot the examples with the highest losses.